Supervised Learning for Automatic Classification of Documents using Self-Organizing Maps
نویسندگان
چکیده
Automatic Document Classification that corresponds with user-predefined classes is a challenging and widely researched area. Self-Organizing Maps (SOM) are unsupervised Artificial Neural Networks (ANN) which are mathematically characterized by transforming high-dimensional data into two-dimension representation, enabling automatic clustering of the input, while preserving higher order topology. A closely related algorithm is the Learning Vector Quantization (LVQ), which uses supervised learning to maximize correct data classification. This study presents the application of SOM and LVQ to automatic document classification, based on predefined set of clusters. A set of documents, manually clustered by domain expert was used. Experimental results show considerable success of automatic document clustering that matches manual clustering, with a slight preference for the LVQ.
منابع مشابه
Semi-Supervised Learning for Web Text Clustering
Supervised learning algorithms usually require large amounts of training data to learn reasonably accurate classifiers. Yet, for many text classification tasks, providing labeled training documents is expensive, while unlabeled documents are readily available in large quantities. Learning from both, labeled and unlabeled documents, in a semi-supervised framework is a promising approach to reduc...
متن کاملAutomating Personal Categorization Using Artificial Neural Networks
Organizations as well as personal users invest a great deal of time in assigning documents they read or write to categories. Automatic document classification that matches user subjective classification is widely used, but much challenging research still remain to be done. The self-organizing map (SOM) is an artificial neural network (ANN) that is mathematically characterized by transforming hi...
متن کاملSelf-Organising Maps in Document Classification: A Comparison with Six Machine Learning Methods
This paper focuses on the use of self-organising maps, also known as Kohonen maps, for the classification task of text documents. The aim is to effectively and automatically classify documents to separate classes based on their topics. The classification with self-organising map was tested with three data sets and the results were then compared to those of six well known baseline methods: k-mea...
متن کاملSom-based Clustering of Textual Documents Using Wordnet
The classification of textual documents has been the subject of many studies. Technologies like the web and numerical libraries facilitated the exponential growth of available documentation. The classification of textual documents is very important since it allows the users to effectively and quickly fly over and understand better the contents of large corpora. Most classification approaches us...
متن کاملAir Quality Modelling by Kohonen’s Self-organizing Feature Maps and LVQ Neural Networks
The paper presents a design of parameters for air quality modelling and the classification of districts into classes according to their pollution. Further, it presents a model design, data pre-processing, the designs of various structures of Kohonen’s Self-organizing Feature Maps (unsupervised methods), the clustering by K-means algorithm and the classification by Learning Vector Quantization n...
متن کامل